@rwells1961
Course Goal: Students will learn the latest data journalism techniques that drive modern newsrooms and public relations / advertising offices. The class will extract and analyze Twitter data with the goal of producing an interactive multimedia presentation.
Course Description: This course will teach students how to code in programs such as R and SQL and how these powerful tools are used in modern news reporting. Quality reporting in newsrooms requires a solid foundation of data analysis. The data skills taught in this class are in high demand in newsrooms and corporations.
Required Text: Machlis, Sharon. Practical R for Mass Communications and Journalism. Chapman & Hall/CRC The R Series. 2018. ISBN 9781138726918 https://www.amazon.com/gp/search?keywords=9781138726918
–
Agenda: –1/14/2019 - Week 1
–Email to students
–Discuss syllabus
–Intro R and R Studio. Open program.
Machlis website:
http://www.machlis.com/R4Journalists/index.html
–R interface explained: Four main windows
Script writing, R Markdown, Table Viewer: Upper Left
Environment - data loaded in R: Upper Right
Console - write commands in R: Lower Left
File Manager and Html table viewer: Bottom Right
–Show basic R skills.
–Loading software. Tidyverse
Rio
–Conventions in coding.
–How many rows? nrow(yourdataset)
–How many columns? ncol(yourdataset)
–What is in the first five rows? head(yourdataset)
–rename columns. create columns
–Misc
Class intros
Book - Amazon
Installation issues on laptops?
Twitter feeds: @rstudiotips
–Run demos from Ch. 3
–Show Collins results
–Twitter analysis of Trump Tweets http://varianceexplained.org/r/trump-tweets/
–Ch 1 & 2 of Machlis: Key Points
Reproducible research
Repetitive tasks in modern newsrooms. Employment reports, crime stats, budgets
Variables - an R object
Assignment operator <-
Case sensitive
Vector: A vector can only have one type of data - all integers, all strings
Dataframe - like a spreadsheet
Save files - Don’t save workspace: because all of your variables will be stored and re-loaded the next time you launch RStudio. It’s too easy to forget about previously stored variables that can interfere with later work,
–Software packages: tidyverse, rio, pacman
–Software: How to get details and help
help(package="dplyr")
browseVignettes("NameOfPackage”)
help("NameOfFunction”)
??median
–Data Types and R
Machlis: 2.4.2 Data types you’re likely to use often
–EXERCISES: Excel vs R
–Load tutorial: Introduction-to-R-January-2019.R > Download this file and open it in R Studio: “CNTL” + click for a New Tab
–Keyboard Shortcuts
Tab - Autocomplete
Control (or Command) + UP arrow - last lines run
Control (or Command) + Enter - Runs current or selected lines of code
in the top left box of RStudio
Shift + Control (or Command) +P - Reruns previous region code
–In Class Exercise:
1. Percentage change from 2010-2017.
2. Produce a table with 5 counties with most growth.
3. Produce table with 5 counties with greatest population loss
4. Graph the top 5 and bottom 5
5. Filter just Benton County’s population for 2015
6. And if you finish that, bring up AOC.csv. How many rows? How many columns?
7. AOC.csv filter the text field for “Pelosi” or “Trump” or “New Deal"
For Monday: Import Income data from US Census
**Notes:** –Basic descriptive statistics —Review ComputerWorld’s Beginner’s Guide To R –Stack Overflow at stackoverflow.com
Reading:
–Machlis. Chapter 1 & 2.
–Beginner’s guide to R: https://www.computerworld.com/article/2497143/business-intelligence/business-intelligence-beginner-s-guide-to-r-introduction.html
–Twitter analysis of Trump Tweets http://varianceexplained.org/r/trump-tweets/
–Review another R tutorial https://docs.google.com/presentation/d/1zICxR7qDM3RQ2Nxi5CqHlM3H8I7qoVkNtqcNcnbbDCw/edit#slide=id.p
Resources: RStudio Navigation Tricks You Might’ve Missed https://rviews.rstudio.com/2016/11/11/easy-tricks-you-mightve-missed/
How Do I? https://smach.github.io/R4JournalismBook/HowDoI.html
Functions https://smach.github.io/R4JournalismBook/functions.html
Packages https://smach.github.io/R4JournalismBook/packages.html –
Agenda: –1/21/2019 - Week 2
–Use R instead of Excel: Andrew Ba Tran
Excellent Tutorial Spelling out Excel and Comparable Commands in R
https://trendct.org/2015/06/12/r-for-beginners-how-to-transition-from-excel-to-r/
Basic data work- head to http://bit.ly/excel_and_r
–Export data Write Export output this file to a CSV or Excel write.csv or write.excel write.csv(AR2016_SMALL,“AR2016_SMALL.csv”)
–More on Data Types and R
Downloading Data 12-24-18.R
–Basic data visualization Basic Data Visualization 12-26-18.R Basic-Chart-January-2018.R
–Work with sample Twitter data
Exercise Median Income For City.R Basic crime rate in R exercise.txt Downloading Data 12-24-18.R
–Converting character strings into numeric –Change column to number format (first you have to strip out the $)
–The $ is a special character
– earnings\(TOTAL.EARNINGS <- gsub("\\\)“,”“, earnings$TOTAL.EARNINGS)
–Function to change the format to numeric
– earnings\(TOTAL.EARNINGS <- as.numeric(earnings\)TOTAL.EARNINGS)
Notes –Loading and basic file management
Bringing in data
Data Frames
Extracting interesting details
Cleaning the data
Reshaping the format
Manipulating the data
Exporting
–How read.table() works for importing data:
Loading data RSQlite - read data from a database xlsx - read in Excel spreadsheets
Manipulating data dplyr - fast data work stringr - work with strings
Data Management mutate Create new column(s) in the data, or change existing column(s). rename Rename column(s).
bind_rows Merge two data frames into one, combining data from columns with the same name.
Math –Summary Statistics
summary(Crime)
mean(x) Calculate the mean, or average, for variable x. median(x) Calculate the median. max(x) Find the maximum value. min(x) Find the minimum value. sum(x) Add all the values together. n() Count the number of records. Here there isn’t a variable in the brackets of the function, because the number of records applies to all variables. n_distinct(x) Count the number of unique values in variable x.
–Using a function for an equation
percent_change <- function(first_number, second_number) { pc <- (second_number-first_number)/first_number*100 return(pc) }
percent_change(100,150) [1] 50
This is what’s happening in the code above: * percent_change is the name of the function, and assigned to it is the function function() * Two variables are necessary to be passed to this function, first_number and second_number * A new object pc is created using some math calculating percent change from the two variables passed to it * the function return() assigns the result of the math to percent_change from the first line Build enough functions and you can save them as your own package.
Visualizing data ggplot2 - charts and maps htmlwidgets - web visualization interactives plotly - exporting charts online
–Data Wrangling-Text Mining in Twitter. –See entire scraping sequence. Extract from Twitter. –Chart –Export Static
Reading:
–Machlis. Chapter 3 & 4.
–Study Twitter meta data
https://developer.twitter.com/en/docs/tweets/data-dictionary/overview/tweet-object.html
–Cohen, “Numbers in the Newsroom,” Common Mistakes.
–String data manipulation https://dereksonderegger.github.io/570L/13-string-manipulation.html
Resources
–All Cheat Sheets https://www.rstudio.com/resources/cheatsheets/
–
Agenda –1/28/2019 - Week 3
–Quiz on Basic R functions
–Describe Assignment #1: Managing Data / Static Graphic
–Filters, Grouping, Sorting –Add a column with a math conversion
–Lubridate –Tidyverse –Setting up an R Workflow –Twitter Metadata
–Work with sample Twitter data
Notes
–DPLYR Five basic verbs • filter() • select() • arrange() • mutate() • summarize() plus group_by()
–Setting up an R Workflow http://learn.r-journalism.com/en/publishing/workflow/r-projects/
–Set up column for math calculations Example: Total column shows winter snowfall in inches. To add a column showing totals in Meters, you can use this format:
.snowdata\(Meters <- snowdata\)Total * 0.0254
–Pipes - a Much-Used Command to Link Filters, Functions
pipe %>%
CMD + Shift + M
–Presentation from Bob Rudis on Writing Readable Code with Pipes, delivered at the rstudio::conf 2017. https://www.rstudio.com/resources/videos/writing-readable-code-with-pipes/ Pipes as a way of chaining commands. object %>% operation() —> result
Reading Machlis Chs. 5 & 6.
Seth C. Lewis, et al. “Big Data and Journalism: Epistemology, Expertise, Economics and Ethics,” Digital Journalism, 2015
Transforming and Analyzing Data dplyr.pdf – Andrew Ba Tran, Washington Post
For working with dates library(lubridate) Dealing-with-dates.pdf by Andrew Ba Tran
Resources:
–For analysis library(dplyr)
Exercises –
Agenda –2/4/2019 - Week 4
–Assignment #1 due Feb. 6: Managing Data / Static Graphic
–GGplot
–Conventions in coding
–R Markdown
–Work with sample Twitter data
Notes
–A handy explanation of ggplot and its components
If you’re using ggplot: plus it! For everything else: pipe it!
So geom_point() geom_bar() geom_boxplot()
Resources
Basic Charts in R
https://www.youtube.com/watch?v=1EUJ0tsVsUA&t=12s
GGplot Video from Andrew Ba Tran
https://www.youtube.com/watch?v=Sx7d7eGRSj0&t=9s
Reading
Machlis Chs. 7 & 8.
Samantha Sunne, “The Challenges and Possible Pitfalls of Data Journalism, and How You Can Avoid Them,” American Press Institute, 2016
charts_with_ggplot by Andrew Ba Tran, Washington Post
–
Agenda –2/11/2019 - Week 5
–Themes for data viz library(ggthemes)
—-Work with sample Twitter data
–Terminology
ggplot
aes
**Resources*
Grammar of Graphics http://vita.had.co.nz/papers/layered-grammar.html
–Graphing GGplot 12-28.R Exercises from Machlis Ch. 9. Facets
Notes –The pie chart focuses the reader on large percentages, and encourages the reader to think of the total –The stacked bar plot provides the same information, but makes it easier to accurately determine at a glance how large each group is out of the whole. –This bar chart splits the categories horizontally, and draws attention to how the family members are ordered. It encourages the reader to think about the distribution rather than disconnected categories, and gives a better sense of sense of scale.
Reading:
Machlis Chs. 9 & 10
Albert Cairo, “The Functional Art,” Principles of Data Visualization.
Exercises
–Create R Markdown document, export to PDF, HTML
–Class Exercise - Graphing and Grouping Data Viz Exercise 2 From Ch 9 –
Agenda –2/18/2019 - Week 6
–Create a GitHub account. https://github.com/
–Follow this tutorial https://guides.github.com/activities/hello-world/
This class is intended to teach you modern workflow techniques for coding. A centerpiece of that workflow is GitHub. This is a website with a system that allows you to collaborate with other programmers on coding projects. It manages versions of software code and is a very popular with the tech elite.
Your GitHub account, which is public, represents an important professional image. Prospective employers and collaborators will look at your GitHub account.
–Andrew Ba Tran Tutorials on GitHub Git and Github Pages http://learn.r-journalism.com/en/git/
Installing Git https://journalismcourses.org/courses/RC0818/installing_git.pdf
GIT https://journalismcourses.org/courses/RC0818/git.pdf
Connecting to Github https://journalismcourses.org/courses/RC0818/github.pdf http://learn.r-journalism.com/en/git/github/github/
Best Practices for Github http://learn.r-journalism.com/en/git/github_pages/github-pages/
–Loading and basic file management
Commit
Branch
Pull Request
Fork
Resources on GitHub
–GitHub flow
https://guides.github.com/introduction/flow/
–GitHub Guides
https://guides.github.com/
–Another GitHub guide
https://andrewbtran.github.io/NICAR/2018/workflow/docs/03-integrating_github.html
Reading:
Machlis Chs. 11 & 12
Installing Git for a Mac - Andrew Ba Tran
Exercises
Pair up.
Team 1 takes this code. Make changes.
Team 2 forks the code. Makes changes.
Pull & Commit
Workbooks, Markdown –
Agenda –2/25/2019 - Week 7
–Due Feb 27: Assignment #2. Visualization of Twitter data
–Analyzing Tweets from public officials. Dataset TK
–Study Twitter meta data
https://twittercommunity.com/ https://developer.twitter.com/en/docs
–Register as Twitter Developer https://developer.twitter.com/en/account/get-started
Reading Machlis Chs. 13 & 14.
Twitter meta data
Resources:
Exercises
–
Agenda –3/4/2019 - Week 8
–Quiz on GitHub
–R Markdown, Desktop Publishing –Andrew Ba Tran - Week 5 Publishing http://learn.r-journalism.com/en/publishing/ –R Markdown http://learn.r-journalism.com/en/publishing/rmarkdown/rmarkdown/ –More R Markdown http://learn.r-journalism.com/en/publishing/more_rmarkdown/more-rmarkdown/
–Rendering html as an output in GitHub
https://rmarkdown.rstudio.com/lesson-9.html https://github.com/rstudio/cheatsheets/raw/master/rmarkdown-2.0.pdf
–R Markdown Formatting
Sizing images: <.img src=“drawing.jpg” alt=“drawing” width=“200”/>
(Note: Remove the period before “img”) https://rpubs.com/RatherBit/90926
–Terminology
Render
Html
Markdown
Notes R Markdown
This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.
When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:
summary(cars)
## speed dist
## Min. : 4.0 Min. : 2.00
## 1st Qu.:12.0 1st Qu.: 26.00
## Median :15.0 Median : 36.00
## Mean :15.4 Mean : 42.98
## 3rd Qu.:19.0 3rd Qu.: 56.00
## Max. :25.0 Max. :120.00
Including Plots
You can also embed plots, for example:
Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.
Reading
Machlis Chs. 15 & 16
Resources:
Exercises
Line breaks: Use HTML tags. Adding
will give a single line break – option for when two-space indentation is ignored.
–
Agenda –3/11/2019 - Week 9
–Extra Grad Assignment - Scraping Twitter. Due March 13.
–Basic text mining techniques
–Lubridate
Reading Machlis Chs. 17 & 18
Resources:
–Joining Dataframes in R
https://www.youtube.com/watch?v=gLg4D9bMIyc&t=13s
–Data Wrangling http://learn.r-journalism.com/en/wrangling/
http://learn.r-journalism.com/en/wrangling/dplyr/dplyr/
https://github.com/r-journalism/learn-chapter-3/blob/master/dplyr/pipes-dplyr.R
Exercises Data Wrangling-Text Mining in Twitter.R
–
.
You will make this in the class
Agenda –3/25/2019 - Week 10
–Due March 27: Assignment #3. Interactive Map
–Produce a basic interactive map in R, post on WordPress
–Andrew Ba Tran - Week 4 Mapping http://learn.r-journalism.com/en/mapping/
Reading “Connecting the Dots” by Jacob Harris (2015) and discuss how people should or should not be represented through news visualizations.
What is code? http://www.bloomberg.com/graphics/2015-paul-ford-what-is-code/
Resources: –Visual Narrative Tricks by Albert Cairo https://www.youtube.com/watch?v=TSGaueL4Ggk
Spatial data maptools - work with shapefiles
Exercises –Maps in R 12-28-18.R
–
Agenda –4/1/2019 - Week 11
–Produce a basic interactive map in R, post on WordPress
–You will make this in the class